Slim: Directly Mining Descriptive Patterns

نویسندگان

  • Koen Smets
  • Jilles Vreeken
چکیده

Mining small, useful, and high-quality sets of patterns has recently become an important topic in data mining. The standard approach is to first mine many candidates, and then to select a good subset. However, the pattern explosion generates such enormous amounts of candidates that by post-processing it is virtually impossible to analyse dense or large databases in any detail. We introduce Slim, an any-time algorithm for mining high-quality sets of itemsets directly from data. We use MDL to identify the best set of itemsets as that set that describes the data best. To approximate this optimum, we iteratively use the current solution to determine what itemset would provide most gain— estimating quality using an accurate heuristic. Without requiring a pre-mined candidate collection, Slim is parameter-free in both theory and practice. Experiments show we mine high-quality pattern sets; while evaluating orders-of-magnitude fewer candidates than our closest competitor, Krimp, we obtain much better compression ratios—closely approximating the locally-optimal strategy. Classification experiments independently verify we characterise data very well.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Instance Driven Hierarchical Clustering of Document Collections

The global pattern mining step in existing pattern-based hierarchical clustering algorithms may result in an unpredictable number of patterns. In this paper, we propose IDHC, a pattern-based hierarchical clustering algorithm that builds a cluster hierarchy without mining for globally significant patterns. IDHC allows each instance to "vote" for its representative size-2 patterns in a way that e...

متن کامل

Supervised Descriptive Rule Discovery: A Unifying Survey of Contrast Set, Emerging Pattern and Subgroup Mining

This paper gives a survey of contrast set mining (CSM), emerging pattern mining (EPM), and subgroup discovery (SD) in a unifying framework named supervised descriptive rule discovery. While all these research areas aim at discovering patterns in the form of rules induced from labeled data, they use different terminology and task definitions, claim to have different goals, claim to use different...

متن کامل

Experience Management with Task-Configurations and Task-Patterns for Descriptive Data Mining

Determining and instantiating an appropriate data mining task and method is often rather difficult, especially for inexperienced users. In this paper, we present a methodological approach for capturing, reusing, and generalizing experiences (task-configurations) for an easier selection and instantiation of appropriate methods and tasks in descriptive data mining. We show how the cases describin...

متن کامل

Efficient Descriptive Community Mining

Community mining is applied in order to identify groups of users which share, e.g., common interests or expertise. This paper presents an approach for mining descriptive patterns in order to characterize communities in terms of their distinctive features: For an efficient discovery approach, we introduce optimistic estimates for obtaining an upper bound for the community quality. We present an ...

متن کامل

Modeling the Structural Relations among Communication Patterns, Love Schemas and Marital Commitment with Mediation of Love Styles

This study examined the relations among communication patterns, love schemas and marital commitment with mediation of love styles. This research had a descriptive-correlational design. The statistical population of the study comprised all married female teachers in public schools of Tabriz in 2017-18 school year. Multi-stage cluster sampling method was utilized and 237 teachers were selected. T...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2012